-
Notifications
You must be signed in to change notification settings - Fork 536
FIX: Restore generate_gantt_chart functionality #3290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
As noted
[T]here is an issue with the number of threads being estimated by the callback, or the gantt chart creation script is pulling in the wrong numbers. Some of the nodes are reporting using 210 threads!
Originally posted by @ccraddock in FCP-INDI/C-PAC#1404 (comment)
I thought maybe
runtime_threadswas counting something different than I expected.
I see the profile uses cpu_percent for runtime_threads which returns a percentage of a CPU, so I think something like math.ceil(cpu_percent)/100 would be an estimate of the number of threads, but there's some disconnected code that looks like it collects the actual number of threads used (as opposed to percentage of 1 CPU).
Originally posted by @shnizzedy in FCP-INDI/C-PAC#1404 (comment)
I think estimating the number of threads (by dividing by
cpu_percent100 and rounding up) is good enough for what I'm trying to do.
Originally posted by @shnizzedy in FCP-INDI/C-PAC#1404 (comment)
I think the issues of
- what
runtime_threadsis logging and - whether the number of threads used by a node is recorded
are related to this PR and issue, but beyond the scope of these changes. C-PAC has its own callback function in which I'm dividing and rounding, so I made no changes regarding runtime_threads in Nipype.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@ ## master #3290 +/- ## ========================================== + Coverage 72.86% 73.15% +0.28% ========================================== Files 1278 1278 Lines 59305 59356 +51 ========================================== + Hits 43212 43419 +207 + Misses 16093 15937 -156 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable, though I don't have any experience with this bit of the code. Inclined to merge tomorrow unless someone complains.
My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just some minor nits.
My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?
I agree 👍
Ref nipy#3290 (comment), nipy#3290 (comment) Co-authored-by: Mathias Goncalves <goncalves.mathias@gmail.com>
My only hesitance is the potentially misleading runtime_threads ― maybe that should be fixed before restoring this functionality?
I agree
Was this fixed? What needs doing?
Was this fixed? What needs doing?
I haven't fixed it (yet at least). The issue is that the chart uses runtime_threads from the callback log as a count of threads observed being used at runtime, but the value actually stored there is cpu_percent,
nipype/nipype/utils/profiler.py
Line 143 in e9217c2
a float representing the current process CPU utilization as a percentage
This leads to thread counts in the hundreds when they're expected to be in the ones, like Gantt chart screenshot with CPU percent in "Threads"
So I think the "threads" part of these charts should be changed before the chart functionality is restored, either
- by updating the log to include an integer count of threads and use this value in the chart
- change the column from threads to CPU percentage
- something else?
Yeah, seems like we want something like:
if status_dict['runtime_threads'] != "N/A":
status_dict['runtime_threads'] //= 100
An existing unit test does
nipype/nipype/interfaces/base/tests/test_resource_monitor.py
Lines 76 to 78 in 6c06030
which is similar to what we're doing for now in C-PAC:
if runtime_threads != 'N/A': runtime_threads = math.ceil(runtime_threads/100)
My concern is that, as I read
Note: the returned value can be > 100.0 in case of a process running multiple threads on different CPU cores.
Note: the returned value is explicitly not split evenly between all available CPUs (differently frompsutil.cpu_percent()). This means that a busy loop process running on a system with 2 logical CPUs will be reported as having 100% CPU utilization instead of 50%. This was done in order to be consistent withtopUNIX utility and also to make it easier to identify processes hogging CPU resources independently from the number of CPUs. It must be noted thattaskmgr.exeon Windows does not behave like this (it would report 50% usage instead). To emulate Windowstaskmgr.exebehavior you can do:p.cpu_percent() / psutil.cpu_count().
― psutil documentation: Process.cpu_percent
this number can be a misleading estimate. For example, if a process is using 25% of each of 4 CPUs, I believe this would report 100%, which would reduce to 1 or 2 threads depending on how we're rounding up or not. I'd be happy to learn that either I'm misunderstanding the number or that the number is good enough.
@shnizzedy Can you rebase/merge master to resolve conflicts? I think we let this go too long and should just merge and let people find bugs and fix them.
Ref nipy#3290 (comment), nipy#3290 (comment) Co-authored-by: Mathias Goncalves <goncalves.mathias@gmail.com>
933fad3 to
6490708
Compare
* exclude nodes without timing information from Gantt chart * fall back on "id" or empty string if no "name" in node
Co-authored-by: Mathias Goncalves <goncalves.mathias@gmail.com>
Co-authored-by: Mathias Goncalves <goncalves.mathias@gmail.com>
Ref nipy#3290 (comment), nipy#3290 (comment) Co-authored-by: Mathias Goncalves <goncalves.mathias@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
6490708 to
6830e3a
Compare
e644bdd to
1bce774
Compare
1bce774 to
19a0355
Compare
b35aa95 to
4c0835f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing tests! Small comments.
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Co-authored-by: Chris Markiewicz <effigies@gmail.com>
Do we want to do this
if status_dict['runtime_threads'] != "N/A": status_dict['runtime_threads'] //= 100
in this PR or kick the can on it?
Let's kick the can. If you want to open another PR in 5 minutes, that's fine with me. Last time we had that question, it delayed things 3 years.
Uh oh!
There was an error while loading. Please reload this page.
Summary
Fixes #2982. Maybe fixes #3527.
(削除) All tests pass locally. 8/13 jobs pass on Travis. The Travis failures seem unrelated to the changes in this PR. (削除ここまで)After rebase, all tests pass.
List of changes proposed in this PR (pull-request)
nipype/nipype/utils/draw_gantt_chart.py
Line 391 in fa65caf
name, useidor an empty string fornameinstead of crashingstartandfinishvalues like"N/A"and"Unknown"nipype/nipype/utils/draw_gantt_chart.py
Lines 144 to 147 in a08ee57
nipype/nipype/utils/draw_gantt_chart.py
Lines 151 to 154 in a08ee57
nipype/nipype/pipeline/plugins/tests/test_callback.py
Lines 66 to 98 in fa65caf
Acknowledgment